Skip to content

feat: add SKIPPED async job status to skip jobs with no data to return#980

Draft
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1775485391-add-skipped-async-job-status
Draft

feat: add SKIPPED async job status to skip jobs with no data to return#980
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1775485391-add-skipped-async-job-status

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 6, 2026

Summary

Adds a new terminal SKIPPED status to the CDK's async job framework. When an API indicates a job has no data to return (e.g., Amazon SP-API's CANCELLED status), connectors can now map it to skipped instead of forcing it into failed or timeout — both of which trigger wasteful retries (up to 3x).

Behavior: SKIPPED partitions free their job budget and are silently dropped — no records are fetched, no retries are attempted, no errors are raised.

Backward-compatible: The skipped key in AsyncJobStatusMap is optional (Optional[List[str]] = None). Existing connector manifests that don't declare it continue to work unchanged.

Files changed:

  • status.py — new SKIPPED enum member (terminal)
  • job_orchestrator.py — partition status aggregation, _process_skipped_partition, match case
  • declarative_component_schema.yaml — optional skipped property on AsyncJobStatusMap
  • declarative_component_schema.py — Pydantic model field
  • model_to_component_factory.py — status mapping + scoped None-guard that only allows None for optional fields (skipped); required fields raise ValueError if None
  • test_job_orchestrator.py — 6 new tests (partition status combos + orchestrator behavior)

Review & Testing Checklist for Human

  • Pydantic model vs codegen: declarative_component_schema.py appears to be auto-generated from the YAML schema. Confirm whether this manual edit will survive the next codegen run, or if the YAML change alone is sufficient. Run /poe build to regenerate and verify.
  • Partition status priority logic (AsyncPartition.status): Verify the ordering of checks is correct — particularly that {COMPLETED, SKIPPED} resolves to COMPLETED (not SKIPPED), and that SKIPPED doesn't shadow the FAILED/TIMED_OUT wildcard retry paths. The subset check statuses <= {AsyncJobStatus.COMPLETED, AsyncJobStatus.SKIPPED} is the key line.
  • Scoped None guard in _create_async_job_status_mapping: The guard only allows None for fields in _OPTIONAL_ASYNC_STATUS_FIELDS (currently {"skipped"}). Required fields (running, completed, failed, timeout) raise ValueError if None. This is not directly unit-tested — consider adding a test.
  • Existing tests that enumerate all AsyncJobStatus values (e.g., test_given_one_failed_job_when_status_then_return_failed creates one job per enum value): These now include a SKIPPED job. Verify the priority expectations still hold with the extra status in the mix.

Suggested test plan: Use the /test slash command to run connector tests with this CDK branch. Then manually verify with a connector that uses async jobs (e.g., source-amazon-seller-partner) by pinning the CDK to this branch and confirming SKIPPED-mapped statuses produce no records and no retries.

Notes

  • Motivated by oncall#11837: Amazon SP-API's CANCELLED means "no data to return" but the CDK currently forces it into timeout/failed, causing 3 wasteful retries that burn rate-limit budget and can cascade into FATAL errors.
  • Companion connector PR: airbytehq/airbyte#76093 — once this CDK change ships, the connector can switch from timeout: [CANCELLED] to skipped: [CANCELLED].

Link to Devin session: https://app.devin.ai/sessions/45ba0c82b9654fbb8ce037502492bd3d

Add a new terminal SKIPPED status to AsyncJobStatus that allows connectors
to indicate a job should be silently skipped (no records fetched, no retries,
no errors). This is useful for APIs like Amazon SP-API where CANCELLED means
'no data to return' and retrying is wasteful.

Changes:
- Add SKIPPED to AsyncJobStatus enum (terminal status)
- Add _process_skipped_partition to AsyncJobOrchestrator (frees job budget, no yield)
- Add SKIPPED handling in partition status aggregation
- Add optional 'skipped' key to AsyncJobStatusMap schema (backward-compatible)
- Add 'skipped' mapping in model_to_component_factory
- Handle None values in _create_async_job_status_mapping for optional fields
- Add unit tests for SKIPPED partition status and orchestrator behavior

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775485391-add-skipped-async-job-status#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775485391-add-skipped-async-job-status

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

PyTest Results (Fast)

4 018 tests  +6   4 007 ✅ +6   7m 48s ⏱️ +9s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit eaf016a. ± Comparison against base commit 5b14f41.

This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_completed_jobs_when_status_then_return_running
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncJobOrchestratorTest ‑ test_given_skipped_does_not_retry
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncJobOrchestratorTest ‑ test_given_skipped_when_create_and_get_completed_partitions_then_skip_without_fetching_records
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_completed_and_skipped_jobs_when_status_then_return_completed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_completed_jobs_when_status_then_return_completed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_skipped_jobs_when_status_then_return_skipped
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_skipped_and_failed_jobs_when_status_then_return_failed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_skipped_and_running_jobs_when_status_then_return_running

♻️ This comment has been updated with latest results.

The generic None check could silently skip a required field that was
accidentally None. Now only fields in _OPTIONAL_ASYNC_STATUS_FIELDS
(currently just 'skipped') are allowed to be None. Required fields
(running, completed, failed, timeout) raise ValueError if None.

Co-Authored-By: bot_apk <apk@cognition.ai>
# This is an element of the dict because of the typing of the CDK but it is not a CDK status
continue

if api_statuses is None:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed in eaf016a — the None guard is now scoped to only the optional skipped field via _OPTIONAL_ASYNC_STATUS_FIELDS = {"skipped"}. If a required field (running, completed, failed, timeout) is ever accidentally None, it will now raise a ValueError instead of being silently skipped.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

PyTest Results (Full)

4 021 tests  +6   4 009 ✅ +6   11m 11s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit eaf016a. ± Comparison against base commit 5b14f41.

This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_completed_jobs_when_status_then_return_running
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncJobOrchestratorTest ‑ test_given_skipped_does_not_retry
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncJobOrchestratorTest ‑ test_given_skipped_when_create_and_get_completed_partitions_then_skip_without_fetching_records
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_completed_and_skipped_jobs_when_status_then_return_completed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_completed_jobs_when_status_then_return_completed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_only_skipped_jobs_when_status_then_return_skipped
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_skipped_and_failed_jobs_when_status_then_return_failed
unit_tests.sources.declarative.async_job.test_job_orchestrator.AsyncPartitionTest ‑ test_given_skipped_and_running_jobs_when_status_then_return_running

@darynaishchenko
Copy link
Copy Markdown
Contributor

Daryna Ishchenko (darynaishchenko) commented Apr 7, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/24082870177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant